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Background: Pedigrees with multiple genotyped family members have been underutilised in breast cancer (BC) genetic- 
association studies. We developed a pedigree-based analytical framework to characterise single-nucleotide polymorphism (SNP) 
associations with BC risk using data from 736 BC families ascertained through multiple affected individuals. On average, eight 
family members had been genotyped for 24 SNPs previously associated with BC. 

Methods: Breast cancer incidence was modelled on the basis of SNP effects and residual polygenic effects. Relative risk (RR) 
estimates were obtained by maximising the retrospective likelihood (RL) of observing the family genotypes conditional on all 
disease phenotypes. Models were extended to assess parent-of-origin effects (POEs). 

Results: Thirteen SNPs were significantly associated with BC under the pedigree RL approach. This approach yielded estimates 
consistent with those from large population-based studies. Logistic regression models ignoring pedigree structure generally gave 
larger RRs and association P-values. SNP rs3817198 in LSP1, previously shown to exhibit POE, yielded maternal and paternal RR 
estimates that were similar to those previously reported (paternal RR =1.12 (95% confidence interval (CI): 0.99-1.27), P= 0.081, 
one-sided P = 0.04; maternal RR = 0.94 (95% CI: 0.84-1.06), P=0.33). No other SNP exhibited POE. 

Conclusion: Our pedigree-based methods provide a valuable and efficient tool for characterising genetic associations with BC risk 
or other diseases and can complement population-based studies. 

Large genome-wide association studies have identified several alleles have been identified (Cox et al, 2007; Easton et al, 2007; 
common genetic variants associated with complex diseases. To Stacey et al, 2007, 2010; Ahmed et al, 2009; Thomas et al, 2009; 
date, more than 60 common breast cancer (BC) susceptibility Antoniou et al, 2010; Turnbull et al, 2010; Fletcher et al, 2011; 
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Milne et al, 2011; Ghoussaini et al, 2012; Hein et al, 2012; 
Michailidou et al, 2013). At the time of the present analysis, 24 
common alleles were known to be involved in BC susceptibility. 
However, recent studies based on genotyping of the iCOGS custom 
array have since identified 47 additional common BC susceptibility 
alleles (Couch et al, 2013; Garcia-Closas et al, 2013; Gaudet et al, 
2013; Michailidou et al, 2013). 

Genome-wide association studies have usually used samples of 
unrelated cases and unrelated controls to evaluate evidence of 
associations and obtain relative risk (RR) estimates. Family-based 
data, where several family members are genotyped, could be an 
additional resource to assess such associations and for characteris- 
ing the risks conferred by genetic susceptibility variants, yet they 
are underutilised (Galvan et al, 2010). This approach is appealing 
because common alleles conferring increased disease risk are 
expected to cluster in families exhibiting disease family history 
(FH). Furthermore, with pedigree data it is possible to estimate 
genetic parent-of-origin specific risks depending on whether a risk 
allele was inherited from the father or mother, which is not 
possible under a population-based study design. Standard case- 
control analysis methods are not optimal for estimating the risks 
conferred by single-nucleotide polymorphisms (SNPs) in situations 
where families are ascertained on the basis of multiple disease 
cases. Analysing pedigree data using standard analytical methods 
(e.g., logistic regression) could lead to biased association estimates 
as they do not account for correlations in genotypes between 
related individuals. In addition, they do not adjust for the fact that 
families may be ascertained on the basis of multiple affected family 
members and that SNPs (or other genetic factors) are expected to 
be correlated with FH of the disease. The retrospective likelihood 
(RL) approach has been shown to adjust for ascertainment bias 
when ascertainment of individuals or families is non-random with 
respect to disease phenotype (Carayol and Bonaiti-Pellie, 2004). 
This approach involves modelling the likelihood of the observed 
family genotypes conditional on family disease phenotypes. We 
developed pedigree RL methods for assessing associations with 
genetic variants and estimating the associated risks in the context 
of genetic susceptibility to BC. This approach takes the form of a 
modified segregation analysis that accounts for explicit correlations 
in genotypes between related individuals while adjusting for 
ascertainment. 

At the time of analysis, 24 SNPs had been shown to be 
associated with BC risk, primarily through large population-based 
case-control studies (Supplementary Table 1) (Cox et al, 2007; 
Easton et al, 2007; Stacey et al, 2007, 2010; Ahmed et al, 2009; 
Thomas et al, 2009; Antoniou et al, 2010; Turnbull et al, 2010; 
Fletcher et al, 2011; Milne et al, 2011; Ghoussaini et al, 2012; Hein 
et al, 2012). We applied the pedigree RL approach to estimate SNP 
associations with BC risk using data from 736 families recruited on 
the basis of strong FH of BC and a set of unrelated unaffected 
controls. Our results were contrasted to those obtained from 
standard analytical methods such as logistic regression. 

There has been criticism of the assumption in association 
studies that maternally and paternally inherited alleles are 
functionally equivalent (Guilmatre and Sharp, 2012). Three 
mechanisms to describe parent-of-origin effects (POEs) have been 
suggested: (i) the influence of the maternal intrauterine environ- 
ment on fetal developments; (ii) expression of genetic variation 
from the maternally inherited mitochondrial genome; and (iii) 
epigenetic regulation of gene expression, for example, genomic 
imprinting (suppression of gene expression that has been passed 
from one parent's germline) (Falls et al, 1999; Haghighi and 
Hodge, 2002; Rampersaud et al, 2008). Classic examples of 
imprinting are Prader-Willi and Angelman syndromes, which 
can occur when the same region on chromosome 15 is either 
maternally or paternally imprinted, respectively (Falls et al, 1999). 
A previous study found that one of the BC susceptibility variants 



that we analysed, SNP rs3817198 in the llpl5 region (LSP1 gene), 
displayed POE with BC risk (Kong et al, 2009). Analysing data 
under a POE-type analysis, the paternally inherited allele expressed 
a significant association (OR=1.17, 95% CI: 1.05-1.30, 
P = 0.0038), whereas the maternally inherited allele did not 
(OR = 0.91, 95% CI: 0.81-1.02, P=0.11). These observations are 
consistent with reports that the llpl5 region hosts a cluster of 
imprinted genes, some of which may be related to BC risk 
(Berteaux et al, 2008). The results presented by Kong et al (2009) 
indicate a paternal effect of this locus on BC risk. These findings 
have not yet been replicated. We extended our pedigree RL 
framework to examine POE by estimating RRs separately for a 
maternally and paternally inherited risk allele. This is not possible 
under a standard case-control analytical design. We evaluated 
these associations for all BC susceptibility alleles investigated. 

We further used the available genotype data to compute a 
combined observed genotype risk score to investigate whether this 
risk score can discriminate between women with FH of BC and 
unaffected women. 



MATERIALS AND METHODS 



Study sample. The Kathleen Cuningham Foundation Consortium 
for Research into Familial Breast Cancer (kConFab) enrols families 
with multiple cases of breast and/or ovarian cancer from Australia 
and New Zealand (Kathleen Cuningham Foundation Consortium 
for research into Familial Breast Cancer (kConFab), 2012). To date, 
kConFab has enrolled over 1400 families. The Australian Ovarian 
Cancer Study (AOCS) has recruited over 1800 ovarian cancer cases 
and 1000 population-based controls (Australian Ovarian Cancer 
Study (AOCS), 2012). 

Our analyses considered data from 798 kConFab families. 
Eligibility was restricted to families with at least one family 
member genotyped for the SNPs of interest. Families were 
systematically screened for and excluded if found to contain a 
mutation in BRCA1, BRCA2 or ATM. We excluded families if at 
least one family member was found to have a mutation in any of 
the CHEK2, TP53, PTEN, RAD51C, MLH1 or MSH2 genes, but 
screening of these genes was less systematic. In total, 736 families 
were eligible for analysis. A total of 897 unaffected population- 
based controls from AOCS were also included. 

Mendelian inconsistencies in genotype transmission from 
parents to offspring were tested using PedCheck (O'Connell and 
Weeks, 1998). Detected Mendelian inconsistencies were rectified 
by first clarifying family relationships. Where this was not possible 
we replaced inconsistent genotypes as missing such that as little 
genetic data were lost and Mendelian consistency throughout the 
remainder of the pedigree held. 

Genotyping. SNPs were genotyped using MALDI-TOF spectro- 
photometric mass determination of allele-specific primer extension 
products with Sequenom MassARRAY platform Sequenom, Inc., 
San Diego, CA, USA and iPLEX Gold technology (Sequenom, 
Inc.,). Primer design was carried out according to Sequenom 
guidelines using MassARRAY Assay Design software (version 3.0). 
Multiplex PCR amplification of fragments containing target SNPs 
was performed using Qiagen HotStart Taq Polymerase (QIAGEN, 
Hilden, Germany) and a PerkinElmer GeneAmp 2400 thermal 
cycler (PerkinElmer, Waltham, MA, USA) with 10 ng genomic 
DNA in 384 well plates. Shrimp Alkaline Phosphatase and allele- 
specific primer extension reactions were carried out according to 
the manufacturer's instructions for iPLEX Gold chemistry. Assay 
data were analysed using Sequenom TYPER software (version 3.4). 
Cluster plots were visually inspected and standard quality- control 
measures were checked, including Hardy- Weinberg equilibrium 
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P^O.01, plate call rate ^95% and duplicate concordance rate 
Ss98% (of 5% duplicated samples). 

Analytical framework. We assumed an underlying genetic model 
where BC susceptibility is explained by the genetic variant of 
interest and a residual polygenic component that represents the 
multiplicative effects of several loci, each of which have 
small contributions to disease risk. The disease incidence, /l,(f), 
was assumed to depend on the genetic effects through a model of 
the form: 

Xi{t)=ko{t) exp [Pgi+Pi] 

where l 0 (i) is the baseline incidence, j8 is the per-allele log RR, 
g ; = {0,1,2} is the SNP genotype for individual i and P t is the 
polygenic component assumed to be normally distributed: 

Pi ~ N(0,o|) 

where ff| is the residual polygenic variance. Because all families 
were found to segregate BRCA1 or BRCA2 mutations, as well as 
some rarer mutations in other susceptibility genes were excluded, 
this model is plausible for the families we analysed. We constrained 
the sum of the variance of the measured locus of interest, it|, and 
the residual polygenic variance, it|, such that they agree with 
external estimates of the total polygenic variance a\ (Antoniou 
et al, 2002). Hence, 

(1) 



This is in line with a multiplicative assumption between the 
measured locus and polygenic component. A previous segregation 
analysis estimated tT P =1.29 (Antoniou et al, 2002). Under the 
polygenic model, exp(c"p) is the coefficient of variation in 
incidences (Risch, 1990). exp(t7p) is also the familial RR (FRR) 
to the monozygotic twin of an affected individual (i M ), such that 
lM=e x p(°p)- Under the assumed model, it has previously been 
shown that the variance of the locus of interest, exp(ff^), will be 
given by log^-Mic) where Amk is the FRR to a monozygotic twin 
due to the locus on its own (Risch, 1990; Antoniou and Easton, 
2003). Therefore, the known component of the polygenic variance 
was calculated as; 



ff|= log 



Yl T s ex P[ 2 A?l 

_g 



log 



Tq+T! exp[2/j[+T 2 exp[4/Jl 
(T 0 +T 1 exp[/J]+T 2 exp[2/i]) : 



(2) 



where z g is the frequency of genotype g= {0,1,2} calculated under 
the Hardy- Weinberg equilibrium assumption (Antoniou and 
Easton, 2003). The polygenic component was approximated by 
the hypergeometric polygenic model (Fernando et al, 1994; Lange, 
1997; Antoniou et al, 2001). 

We assumed a censoring process such that an individual was 
followed from birth until the age at first BC diagnosis, age of death, 
age at last observation or at 80 years of age, whichever occurred 
first. Individuals censored at 80 years of age were censored as 
unaffected at this time point. We assumed men were not at risk of 
developing BC. In the instance of no available censoring age, we 
censored at 0 years. 

The BC incidences were constrained over all genetic effects 
(Antoniou et al, 2001) to agree with the Australian female BC 
incidences for the 1993-1997 calendar period (International 
Agency for Research on Cancer (IARC), 2010). 



parameter estimates, we maximised the likelihood over the 
genotype frequencies and log RR. We also fitted models where 
no residual polygenic effect was assumed in order to investigate the 
effect on parameter estimates when no assumptions were made 
about the residual familial clustering of BC. 

Parent-of-origin effects. The pedigree RL framework was 
extended to account for POE. Here we simultaneously model the 
risk associated with a maternally inherited allele and paternally 
inherited allele. We denote the maternal log RR as /? m , the paternal 
log RR as jS p , a maternally inherited risk allele indicator variable 
taking values 0 if no maternally inherited risk allele is present and 
1 if a maternally inherited risk allele is present as g, m and similarly 
a paternally inherited risk allele as gs . Under this model, the 
disease incidence had the form: 

A,(*)=V*) =P [ftnft.+J»pft] 

We jointly maximised the likelihood over allele frequencies and 
both the maternal and paternal log RRs to obtain estimates for 
these parameters. 

We evaluated evidence for POE by testing for differences 
between the maternal log RR and paternal log RR using a 
likelihood ratio test. For this purpose, the likelihood obtained from 
the POE model was compared with the likelihood under a single 
gene model that estimated a single per-allele HR assuming the 
same effect for maternally and paternally inherited risk alleles. 

As the primary aim of the POE analysis was to test for equality in 
the paternal and maternal log RRs, the polygenic component was 
omitted. This was in order to reduce the computational complexity. 

Logistic regression analyses. Standard logistic regression analyses 
were performed for comparison purposes. To account for 
relatedness within families, we estimated robust s.e. (Huber, 
1967; White, 1980, 1982). Two types of analyses were undertaken: 
(i) unaffected AOCS controls vs all affected kConFab female family 
members and (ii) unaffected AOCS controls vs one selected 
affected kConFab female per family (usually the female family 
member that led to family ascertainment). 

Assessing discrimination based on SNP profiles. To evaluate the 
ability of SNP profiles to discriminate between unaffected women 
and affected women with FH of BC, we computed an observed risk 
score (ORS) for each individual. The score, S„ for individual i 
based on the combined effects of all SNPs was given by: 

S 

i=i 

where S is the number of SNPs, /? ; - is the published population-based 
estimate of the per-allele log OR (Supplementary Table 1) and 
gji = {0,1,2} is the observed genotype for individual i at SNP j. The 
ORS was calculated for a single affected female family member who 
had been genotyped for all SNPs and all controls. The discriminatory 
ability of the ORS was evaluated using receiver operating character- 
istic (ROC) curves by calculating the area under the curve (AUC). 

Statistical software. Logistic regression and ROC analyses were 
performed using Stata version 11.1 (StataCorp LP, 2009). The 
segregation and POE models were implemented using pedigree 
analysis software MENDEL (Lange et al, 1988). 



RESULTS 



Retrospective likelihood segregation models. Because families 
were ascertained on the basis of multiple affected family members, 
we modelled the RL of observing family genotypes conditional on 
family disease phenotypes. The likelihood was parameterised in 
terms of the allele frequency and per-allele log RRs (/J). To obtain 



Study population. After quality- control checks, 736 kConFab 
families with at least one genotyped individual, comprising 45 822 
individuals, and 897 unrelated unaffected controls from AOCS 
were eligible for analyses. Sample characteristics are summarised in 
Table 1. In brief, 6907 individuals were genotyped for at least one 
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SNP. Of these, 1673 (24.2%) were male and 5234 (75.8%) were 
female. In total, 1590 (30.4%) affected females and 3644 (69.6%) 
unaffected females were genotyped. The average number of 
individuals genotyped in these families was eight. 

Single SNP association results using logistic regression and 
segregation analyses. Tables 2 and 3 display logistic regression 
and segregation analysis results. Figure 1 shows a comparison of 
log RR estimates under different analytical models. 

Single gene models. Fourteen SNPs were significantly associated 
with BC risk at the 5% significance level when data were analysed 
under a single gene model that does not allow for residual 
polygenic effects. The most significant association was FGFR2 SNP 
rs2981582 (HR= 1.20, 95% CI: 1.13-1.27, P = 6.75 x 10~ 10 ). 

Incorporating residual polygenic effects. Thirteen SNPs were 
significantly associated with BC risk (5% significance level) when 
data were analysed under the model allowing for residual familial 
clustering in terms of a polygenic component. All these SNPs were 
significantly associated when the data were analysed under the 
single gene model. C6orf97 SNP rs 12662670 was the only SNP 
significantly associated under the single gene model that was not 



Table 1. Summary of the kConFab anc 


AOCS study po 


Dulations 






Study 




kConFab 


AOCS 


n 


45 822 


897 


Males/female 


23415/22 407 


0/897 


Pedigrees 


736 


897 


Unaffected/affected 


42 709/3113 


897/0 


Unaffected/affected (females only) 


19294/3113 


897/0 


n genotyped (at least one SNP) 


6010 


897 


Male/female 


1673/4337 


0/897 


Unaffected/affected 


4420/1 590 


897/0 


Unaffected/affected (females only) 


2747/1 590 


897/0 


n genotyped (22 risk prediction SNPs) 


574 


715 


Male/female 


14/560 


0/715 


Unaffected/affected 


79/495 


0/715 


Unaffected/affected (females only) 


65/495 


0/715 


n genotyped (all 24 SNPs) 


564 


714 


Male/female 


14/550 


0/714 


Unaffected/affected 


79/485 


0/714 


Unaffected/affected (females only) 


65/485 


0/714 


Mean (s.d.) censoring age (unaffected) 


45.00 (23.75) 


57.37 (11.62) 


Censored aged ^18 years 


52.29 (18.83) 


57.37 (11.62) 


Females only 


38.41 (27.10) 


57.37 (11.62) 


Females censored aged ^18 years 


51.90 (19.23) 


57.37 (11.62) 


Mean (s.d.) censoring age (affected) 


51.50 (12.12) 


N/A 


Censored aged ^18 years 


51.50 (12.12) 


N/A 


Females only 


51.50 (12.12) 


N/A 


Females censored aged >18 years 


51.50 (12.12) 


N/A 


Abbreviations: AOCS = Australian Ovarian Cancer Study; kConFab = Kathleen Cuningham 
Foundation Consortium for Research into Familial Breast Cancer; n = number of Individuals 
in sample. Censoring age in years. 



associated with risk under the model that incorporates polygenic 
background (single gene P=3.64xl0~ 4 ; polygenic P = 0.086). 
Overall, P-values of association were similar under both pedigree 
analysis models (Figure 2). As with the single gene model, FGFR2 
SNP rs2981582 provided the strongest association with BC risk 
(HR= 1.26, 95% CI: 1.17-1.36, P= 9.04 x 10" 10 ). For SNPs 
providing evidence of association (P<0.05), the effect size 
estimates were somewhat larger under the model allowing for 
polygenic background but the strength of association was generally 
similar. The estimated HRs under the polygenic model were closer 
to OR estimates obtained from population-based studies than the 
estimates under the model that did not allow for polygenic 
background (Figure 1). 

SNPs that were significantly associated with risk accounted for 
between 0.20 and 1.62% of the total polygenic variance, but most 
SNPs accounted for < 1%. Only two SNPs, rs2981582 in FGFR2 
and rsl3387042 at 2q35, accounted for > 1% of the total polygenic 
variance. 

A comparison of estimates of association from the segregation 
analyses to those obtained from the naive standard case-control 
analyses revealed that logistic regression typically overestimated 
associations. For almost all SNPs, the absolute value of the 
estimated log OR from the logistic regression comparing AOCS 
controls against all female cases exceeded those obtained under the 
segregation models. Moreover, the estimated ORs more often lay 
outside the CIs of the population-based OR estimates compared 
with the segregation analysis models (Supplementary Figure 1). 

Parent-of-origin effects. The POE segregation analyses were 
performed assuming no residual polygenic background. This is a 
reasonable assumption as the primary aim was to test for 
differences in paternal and maternal HRs. Moreover, the pedigree 
analysis becomes complex because of the implementation of the 
hypergeometric approximation to the polygenic model. Results for 
POE analyses are given in Table 4. 

Two SNPs showed significant associations with the paternally 
inherited allele only. Five SNPs yielded significant associations with 
the maternally inherited allele only. The HR estimate for the 
paternally inherited allele of SNP rs3817198 in LSP1 was 1.12 (95% 
CI: 0.99-1.27, P= 0.081). Under a one-sided hypothesis testing HR 
> 1, the P- value was 0.04. 

One SNP, rs 13387042 at 2q35, showed statistically significant 
associations for both a paternally inherited (HR= 1.20, 95% CI: 
1.04-1.37, P = 0.0096) and maternally inherited risk allele 
(HR=1.16, 95% CI: 1.03-1.31, P= 0.014). No SNP exhibited 
significant differences between HR estimates for the maternally 
and paternally inherited allele (P-value range: 0.07-0.95). 

Risk score comparisons. Two SNPs at 19pl3 (rs2363956 and 
rs8170) were excluded when constructing risk scores as they are 
primarily associated with ER-negative BC risk (Antoniou et al, 
2010). The mean (s.d.) ORS was 2.47 (0.40) in 1147 individuals 
(715 unaffected and 432 affected) genotyped for all 22 SNPs. There 
was a significant difference in the mean ORS between unaffected 
(mean ORS (s.d.) = 2.40 (0.39)) and affected (2.60 (0.39)) women 
(P = 6.38 x 10 ~ 17 ). The estimated AUC was 0.642 (95% CI: 
0.610-0.675) (Figure 3). 

As expected, the distribution of the ORS for unaffected women 
from the kConFab families, that is women with FH of BC, lies 
between the risk distributions of the population-based controls and 
affected women (Supplementary Figure 2). 



DISCUSSION 



In this article, we developed an analytical framework to estimate 
associations between SNPs and BC risk within a pedigree setting. 
This approach provides an efficient method for investigating 
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Table 2. Logistic regression analysis results 



AOCS controls vs all female cases 



AOCS controls vs one selected 
female case per family 



SNP 


Pedigrees 


MTTecieu/ 
unaffected 


RAF a 


OR (95% Cl) b 


P-value 


MTTecieu/ 
unaffected 


RAF a 


OR (95% Cl) b 


P-value 


rs2981582 


1577 


1460/892 


0.398 


1 .44 (1 .26-1 .65) 


1.46x 10~ 7 


837/892 


0.398 


1.36 (1.18-1.58) 


0.00003 


rs1 975930 


1515 


1504/813 


0.106 


0.75 (0.60-0.94) 


0.01115 


812/813 


0.106 


0.74 (0.58-0.95) 


0.01788 


rs1 0941 679 


1485 


719/873 


0.258 


1.16 (0.99-1.36) 


0.07327 


601/873 


0.258 


1.19 (1.01-1.41) 


0.03899 


rs3803662 


1558 


1461/872 


0.274 


1.30 (1.13-1.50) 


0.00024 


845/872 


0.274 


1.31 (1.12-1.52) 


0.00069 


rs2046210 


1563 


1471/874 


0.348 


1.14 (1.00-1.31) 


0.05218 


846/874 


0.348 


1.17 (1.01-1.35) 


0.03836 


rs614367 


1607 


1571/891 


0.158 


1.19 (1.01-1.40) 


0.04325 


877/891 


0.158 


1 .20 (1 .00-1 .44) 


0.05003 


rs1 05091 68 


1608 


1563/892 


0.474 


0.80 (0.71-0.92) 


0.00124 


846/892 


0.474 


0.80 (0.69-0.92) 


0.00205 


rs1 292011 


1433 


823/812 


0.432 


0.95 (0.82-1.10) 


0.50408 


574/812 


0.432 


0.95 (0.81-1.11) 


0.50687 


rs1 3387042 


1551 


1452/863 


0.483 


1.41 (1.24-1.61) 


2.61 x 10- 7 


790/863 


0.483 


1 .38 (1 .20-1 .60) 


0.00001 


rs13281615 


1561 


1422/874 


0.402 


1.19 (1.04-1.35) 


0.01090 


838/874 


0.402 


1.16 (1.01-1.34) 


0.03900 


rs865686 


1511 


1426/812 


0.393 


0.90 (0.78-1.03) 


0.13578 


775/812 


0.393 


0.86 (0.73-1 .00) 


0.04650 


rs1 1249433 


1565 


1474/875 


0.413 


1.11 (0.97-1.27) 


0.11657 


847/875 


0.413 


1 .08 (0.94-1 .25) 


0.29256 


rs2823093 


1511 


1489/813 


0.252 


0.97 (0.84-1.14) 


0.74598 


792/813 


0.252 


0.92 (0.78-1 .09) 


0.35286 


rs3817198 


1562 


1463/873 


0.324 


1.10 (0.96-1.26) 


0.18494 


846/873 


0.324 


0.98 (0.84-1.14) 


0.81915 


rs889312 


1560 


1462/871 


0.280 


1.13 (0.98-1.31) 


0.08430 


840/871 


0.280 


1.14 (0.97-1.33) 


0.10425 


rs1011970 


1608 


1573/892 


0.161 


1.12 (0.95-1.32) 


0.18912 


860/892 


0.161 


1 .08 (0.89-1 .30) 


0.43456 


rs1 7468277 


1564 


1468/875 


0.141 


0.79 (0.65-0.96) 


0.01540 


853/875 


0.141 


0.78 (0.63-0.97) 


0.02331 


rs999737 


1528 


1492/831 


0.746 


1.16 (0.99-1.35) 


0.05823 


853/831 


0.746 


1.16 (0.99-1.37) 


0.07231 


rs2380205 


1580 


898/891 


0.408 


1.15 (1.00-1.32) 


0.05295 


636/891 


0.408 


1.09 (0.95-1.26) 


0.22967 


rs4973768 


1558 


1439/873 


0.455 


1.22 (1.07-1.39) 


0.00289 


814/873 


0.455 


1.22 (1.06-1.41) 


0.00507 


rs6504950 


1563 


1468/875 


0.302 


0.92 (0.80-1.06) 


0.26037 


858/875 


0.302 


0.96 (0.82-1.11) 


0.56381 


rs2363956 


1507 


1388/813 


0.502 


1.06 (0.92-1.21) 


0.42933 


775/813 


0.502 


1 .06 (0.92-1 .23) 


0.43906 


rs8170 


1512 


1496/813 


0.191 


1.02 (0.86-1.21) 


0.78295 


806/813 


0.191 


1.04 (0.87-1.26) 


0.64778 


rs1 2662670 


1514 


1500/813 


0.065 


1 .60 (1 .25-2.04) 


0.00016 


799/813 


0.065 


1.67 (1.28-2.17) 


0.00015 


Abbreviations: AOCS = Australian Ovarian Cancer Study; CI = confidence interval; OR = odds ratio; RAF = risk allele frequency; SNP, single-nucleotide polymorphism. 
a RAF is the observed risk allele frequency in unaffected individuals. 

kper-allele OR is reported such that the effect allele is the same as those from the population-based studies (Supplementary Table 1). 



associations of polymorphisms on disease risk. We extended these 
methods to estimate parent-of-origin associations by separately 
estimating HRs for maternally and paternally inherited risk alleles. 
This is the first time POE have been evaluated for most of the 
common genetic variants found to be associated with BC risk. 
Although we demonstrate these methods in the context of 
evaluating associations with BC risk, the principles are applicable 
to other cancers but also other complex diseases that exhibit 
familial aggregation. 

We applied these methods to family data from kConFab, a 
family-based study in which families were recruited through 
multiple relatives diagnosed with breast and/or breast/ovarian 
cancer. Analysing such associations using standard analytical 
methods could yield biased association estimates due to non- 
random ascertainment of families with respect to disease 
phenotype and that genetic variants are likely to be correlated 
with FH of disease. Analysing data within a pedigree RL framework 
accounts for relatedness and adjusts for ascertainment bias. 

Our results demonstrate that standard logistic regression 
analyses applied in this context generally overestimate the 
magnitude of disease associations when compared with estimates 
published by large collaborative studies. More often, those were 
outside the published CIs. However, estimates from the modified 



segregation analysis were, generally, very close and within the CIs 
of the reported estimates by the population-based studies (Cox 
et al, 2007; Easton et al, 2007; Stacey et al, 2007, 2010; Ahmed et al, 
2009; Thomas et al, 2009; Antoniou et al, 2010; Turnbull et al, 
2010; Fletcher et al, 2011; Milne et al, 2011; Ghoussaini et al, 2012; 
Hein et al, 2012). 

In addition, the segregation models generally yielded smaller 
P- values for association than those obtained through the logistic 
regression analysis. This suggests that this approach has greater 
power to detect associations than using standard case-control 
analysis that ignores pedigree structure. Likely explanations 
include the fact that pedigree analysis methods model exact 
genetic correlations between relatives, and the additional 
information is extracted by phenotypes of family members that 
had not been genotyped. Additional gains in power would be 
expected by the use of pedigree-based methods in settings where 
a clear ascertainment process exists, which would involve 
conditioning on the phenotypes of all family members. There- 
fore, a family-based approach is a useful and efficient method to 
investigate the contribution of genetic variants to disease risk. 

Our models used external data on population BC incidences and 
for the magnitude of the assumed polygenic variance in the 
polygenic model. Sensitivity analysis by misspecifying the assumed 
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Figure 1. Scatter plots of log RR estimates from published population-based studies (Supplementary Table 1) (all x-axes) vs: (A) logistic 
regression estimates comparing AOCS controls against all familial cases (Table 2); (B) logistic regression estimates comparing AOCS controls 
against one selected female case per family (Table 2); (C) single gene segregation model estimates (Table 3); and (D) polygenic segregation model 
estimates (Table 3). The dashed line isy = x, the line of equality. ICC = intraclass correlation coefficient. 
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Figure 2. Scatter plot of — log 10 P-values from the: (i) polygenic segregation model (Table 3); (ii) single gene segregation model (Table 3); (iii) 
logistic regression A: logistic regression estimates comparing AOCS controls against all familial cases (Table 2); and (iv) logistic regression B: 
logistic regression estimates comparing AOCS controls against one selected female case per family (Table 2). The dashed line represents a P-value 
of 0.05, the nominal significance level. SNPs are ordered by the P-values of the polygenic segregation analysis model. The segregation models 
generally yielded smaller P-values, indicating that these models have greater power to detect associations. 1 9p1 3 SNPs rs2363956 and rs81 70 are 
not displayed as they are associated with ER-negative BC. 



population incidences to be half or double the true population 
incidences revealed small deviations in the RR estimates (relative bias 
< 3%). Similarly, varying the assumed polygenic variance to be up to 
80% of the assumed polygenic variance in our models had a 
negligible effect on the RR estimates (relative bias < 1%). This 
suggests that the estimates obtained under the methods presented are 
robust against misspecifications in the external model parameters. 

Alternative association methods using pedigree data have been 
suggested. A case-only pedigree RL approach had been suggested 



and applied to the analysis of associations with prostate cancer risk 
(Schaid et al, 2010). However, this differs from our approach in 
that it does not consider genotype data from unaffected family 
members. Our approach allows for estimation of allele frequencies 
and RR parameters simultaneously, whereas Schaid et al used 
external allele frequency estimates. Unlike Schaid et al, our 
analyses incorporated all genetic information provided from all 
family members, therefore providing more information in the 
estimation process. The genetic model employed by Schaid et al 
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Table 4. Segregation analysis results allowing for parent-of-origin effects 



Maternal Paternal Difference 



SNP 


Pedigrees 


RAF (s.e.) a 


HR (95% Cl) b 


P-value 


HR (95% Cl) b 


P-value 


d (s.e.) 


x 2 


P-value 


logL 


AIC 


rs2981582 


1626 


0.401 (0.008) 


1 .26 (1 .09-1 .46) 


0.002 


1.13 (0.95-1.34) 


0.157 


-0.107 (0.150) 


0.504 


0.478 


- 5904.026 


11 814.05 


rs1 975930 


1549 


0.113 (0.005) 


0.70 (0.55-0.90) 


0.006 


1 .00 (0.83-1 .20) 


0.981 


0.348 (0.194) 


3.180 


0.075 


-2802.051 


5610.10 


ts1 0941 679 


1505 


0.257 (0.010) 


0.91 (0.62-1.32) 


0.611 


1.31 (0.94-1.81) 


0.108 


0.366 (0.347) 


0.684 


0.408 


- 1480.474 


2966.95 


rs3803662 


1606 


0.268 (0.007) 


1.11 (0.99-1.25) 


0.080 


1.21 (1.07-1.38) 


0.003 


0.088 (0.109) 


0.654 


0.419 


-5203.282 


10412.56 


rs2046210 


1608 


0.348 (0.007) 


1.03 (0.91-1.16) 


0.637 


1.14 (1 .00-1.30) 


0.051 


0.102 (0.112) 


0.830 


0.362 


-5675.565 


1 1 357.13 


rs614367 


1627 


0.152 (0.005) 


1 .23 (1 .08-1 .39) 


0.001 


0.98 (0.83-1.17) 


0.863 


-0.220 (0.134) 


2.700 


0.100 


- 3940.095 


7886.19 


ts1 05091 68 


1628 


0.466 (0.008) 


0.84 (0.75-0.95) 


0.007 


0.98 (0.86-1.12) 


0.769 


0.151 (0.118) 


1.588 


0.208 


- 5957.49 


11 920.98 


rs1 292011 


1455 


0.433 (0.011) 


1.17 (0.92-1.49) 


0.205 


0.78 (0.59-1.03) 


0.082 


- 0.400 (0.252) 


1.628 


0.202 


-2176.575 


4359.15 


rs1 3387042 


1598 


0.489 (0.008) 


1.16 (1.03-1.31) 


0.014 


1 .20 (1 .04-1 .37) 


0.010 


0.030 (0.115) 


0.068 


0.795 


- 5904.889 


11 815.78 


rs13281615 


1608 


0.401 (0.008) 


1 .09 (0.96-1 .22) 


0.178 


1.10 (0.96-1.27) 


0.155 


0.017 (0.116) 


0.022 


0.882 


- 5585.132 


11 176.26 


rs865686 


1547 


0.386 (0.008) 


0.99 (0.87-1.14) 


0.934 


0.91 (0.77-1.06) 


0.214 


-0.093 (0.136) 


0.468 


0.494 


-5397.179 


10800.36 


rs1 1249433 


1609 


0.414 (0.008) 


1.06 (0.94-1.19) 


0.346 


1 .05 (0.92-1 .20) 


0.486 


-0.009 (0.114) 


0.006 


0.937 


-5949.532 


1 1 905.06 


rs2823093 


1549 


0.261 (0.007) 


0.89 (0.77-1.02) 


0.104 


1 .03 (0.89-1 .20) 


0.669 


0.152 (0.133) 


1.292 


0.256 


- 4881 .996 


9769.99 


rs3817198 


1607 


0.337 (0.007) 


0.94 (0.84-1 .06) 


0.330 


1.12 (0.99-1.27) 


0.081 


0.170 (0.108) 


2.468 


0.116 


-5450.369 


10906.74 


rs889312 


1605 


0.283 (0.007) 


1.01 (0.90-1.15) 


0.811 


1.14 (1 .01-1.30) 


0.042 


0.119 (0.111) 


1.152 


0.283 


-5180.945 


10367.89 


rs101 1970 


1628 


0.172 (0.006) 


1.05 (0.91-1.21) 


0.489 


0.99 (0.84-1.17) 


0.912 


-0.059 (0.136) 


0.186 


0.666 


-4031.07 


8068.14 


rs1 7468277 


1609 


0.141 (0.005) 


0.90 (0.72-1.14) 


0.389 


0.85 (0.65-1.10) 


0.224 


- 0.062 (0.233) 


0.072 


0.788 


-3351.337 


6708.67 


rs999737 


1566 


0.743 (0.007) 


1 .09 (0.95-1 .25) 


0.242 


1.11 (0.94-1.30) 


0.219 


0.018 (0.136) 


0.018 


0.894 


-4767.749 


9541.50 


rs2380205 


1599 


0.412 (0.010) 


0.94 (0.76-1.16) 


0.587 


1.19 (0.95-1.49) 


0.128 


0.233 (0.210) 


1.100 


0.294 


-2370.253 


4746.51 


rs4973768 


1607 


0.462 (0.008) 


1 .09 (0.96-1 .23) 


0.168 


1 .08 (0.94-1 .24) 


0.269 


-0.009 (0.119) 


0.006 


0.937 


-5904.616 


11 815.23 


rs6504950 


1609 


0.288 (0.007) 


1.02 (0.90-1.16) 


0.724 


0.94 (0.81-1.08) 


0.379 


-0.088 (0.122) 


0.514 


0.473 


- 5224.443 


10454.89 


rs2363956 


1548 


0.508 (0.008) 


1.04 (0.92-1.19) 


0.506 


0.99 (0.86-1.14) 


0.914 


-0.051 (0.124) 


0.170 


0.680 


- 5362.528 


10731.06 


rs8170 


1549 


0.182 (0.006) 


1.10 (0.95-1.27) 


0.194 


0.98 (0.82-1.17) 


0.812 


-0.116 (0.146) 


0.620 


0.431 


- 4062.888 


8131.78 


rs1 2662670 


1549 


0.076 (0.004) 


1 .26 (1 .08-1 .46) 


0.004 


1 .08 (0.88-1 .34) 


0.460 


-0.148 (0.161) 


0.854 


0.355 


-2572.856 


5151.71 



Abbreviations: AIC = Akaike Information Criterion (Akaike, 1974); CI = confidence interval; x 2 = 1 df test statistic based on likelihood ratio test between the POE model and the standard major 
gene segregation model; S = difference between maternal and paternal log HRs; HR = hazard ratio; logl_ = model maximum log-likelihood; RAF = risk allele frequency. 
a RAF is the maximum likelihood estimate of the risk allele frequency from the segregation analysis model. 

kper-allele HR is reported such that the effect allele is the same as those from the population-based studies (Supplementary Table 1). 



A B 




Observed risk score 1 - Specificity 

Figure 3. (A) Density plots of the ORS based on 22 SNPs for women with FH of BC (n = 432) and controls (n = 715). (B) ROC curve for the ability 
of the ORS based on 22 SNPs to discriminate between cases with FH and controls. The x-axis is 1 -specificity (false-positive rate) and the y-axis is 
the sensitivity (true-positive rate). The dashed line represents an AUC of 0.50, indicating prediction no better than chance alone. 



was similar to our model by allowing for residual correlations 
between family members using a random baseline risk parameter. 
Schaid et al found that RRs estimated under the pedigree RL were 
consistent with ORs estimated by large case-control studies, 
agreeing with our findings. 



After accounting for ascertainment and the residual polygenic 
variance, the RR estimates for the known common BC suscept- 
ibility alleles were similar to those obtained from population-based 
case-control studies (Cox et al, 2007; Easton et al, 2007; Stacey 
et al, 2007, 2010; Ahmed et al, 2009; Thomas et al, 2009; Antoniou 
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et al, 2010; Turnbull et al, 2010; Fletcher et al, 2011; Milne et al, 
2011; Ghoussaini et al, 2012; Hein et al, 2012). This observation 
suggests that the polygenic model of inheritance provides a good fit 
to the observed familial aggregation of BC. First, it implies that the 
residual genetic susceptibility to BC is unlikely to be due genes 
conferring large contributions to the familial risk of the disease of 
magnitude similar to that of BRCA1 or BRCA2 mutations. Instead, 
the residual genetic variability is likely to be due to genetic effects 
that have small contributions to the BC familial risk. That is, either 
common alleles conferring low risks or rare variants conferring 
moderate risks. Second, our findings suggest a general model of 
genetic susceptibility where the joint effects of the common alleles 
studied in the present study and other, as yet unidentified, BC 
susceptibility variants are multiplicative. Therefore, we can infer 
that interactions between the studied common alleles and other 
residual genetic effects are unlikely. 

The pedigree RL was adapted to estimate parent-specific genetic 
effects for each common allele. This was achieved by separately 
estimating the risk for a maternally and paternally inherited risk 
allele. Although other methods have been suggested for evaluating 
POE, those involve direct genotyping of parents and offspring, and 
they may not make full use of multigenerational pedigree data or 
do not adjust adequately for ascertainment (Haghighi and Hodge, 
2002; Belonogova et al, 2009; Kong et al, 2009; Feng et al, 2011; He 
et al, 2011; Li et al, 2011). 

Our analyses suggested no significant differences between 
estimated HRs for maternally and paternally inherited alleles for 
any of the 24 SNPs. The LSP1 SNP rs3817198 had previously been 
shown to display POE with BC risk where the paternally inherited 
allele was associated with increased BC risk (OR= 1.17, 95% CI: 
1.05-1.30, P= 0.0038) (Kong et al, 2009). They also found a 
decreased BC risk if the risk allele was maternally inherited, but 
this was not significant (OR=0.91, 95% CI: 0.81-1.02, P = 0.11). 
The magnitude and direction of our estimates for this SNP are 
comparable to those reported by Kong et al (paternal HR= 1.12, 
95% CI: 0.99-1.27, P= 0.081; maternal HR=0.94, 95% CI: 0.84- 
1.06, P= 0.33). Our analyses did not detect a significant difference 
between the maternal and paternal effect (P = 0.11). This is 
possibly because of the much greater sample size employed by 
Kong et al - 34909 controls and 1803 BC cases, all of whom were 
genotyped or had imputed genotype data available. Our analyses 
included 5251 unaffected individuals and 1463 BC cases. It is worth 
noting that the paternal HR for LSP1 SNP rs3817198 was 
significant under a one-sided test for the hypothesis that the 
paternal HR > 1 (P = 0.04). We meta-analysed our LSP1 SNP RR 
estimates with those reported by Kong et al (Supplementary 
Table 2). The meta-analysis yielded a maternal RR= 0.93 (95% CI: 
0.85-1.01, P = 0.066) and a paternal RR= 1.15 (95% CI: 1.06-1.24, 
P=7.8 x 10 ~ 4 ). These analyses suggest no association with the 
maternally inherited C allele but provides stronger evidence of 
association with the paternally inherited C allele. Although no 
significant differences were observed between the estimates for the 
paternally and maternally inherited alleles at other loci, we 
observed associations for several SNPs with either the maternally 
or paternally inherited alleles. The current approach for evaluating 
POE could, potentially, be useful in the fine mapping efforts of 
these loci in determining causal variants. 

Recent studies have estimated the ROC AUC to investigate the 
effect of SNPs on discriminating between affected and unaffected 
women. Wacholder et al (2010) used a modified Gail model to 
demonstrate an increase in AUC from 0.580 to 0.618 when the 
effects of the (at the time) 10 known genetic variants associated 
with BC risk were incorporated into the model. Sawyer et al (2012) 
have described the largest AUC (0.654, 95% CI: 0.628-0.680) based 
purely on genetic factors. Their analyses included 22 genetic 
variants in women with FH of BC in the absence of a known 
BRCA1 or BRCA2 mutation. We describe a similar AUC when 



considering the ORS as the sole risk predictor for individuals 
genotyped for all 22 SNPs. This is consistent with the fact that 
women with FH of BC are expected to have a higher polygenic load 
due to familial aggregation of the disease. This suggests that a high 
polygenic score in combination with a FH of the disease could 
jointly provide a way to identify those who may be at higher risk of 
developing the disease, rather than SNPs alone. 

In summary, we have presented a novel analytical framework 
for evaluating associations between common genetic variants and 
disease risk that harnesses the power and efficiency of family data. 
Although the methods have been presented in the context of BC 
susceptibility, the general principles are applicable to other cancers 
and other complex diseases that have a heritable component. We 
applied these techniques to data on common susceptibility alleles, 
although, in principle, the methods could be applied to analyse rare 
variants conferring moderate cancer risks. We have further 
demonstrated that combined SNP profiles discriminate more 
effectively BC- affected status in individuals with FH of the disease 
compared with the general population, taking us closer to the goal 
of incorporating SNP profiling into clinical practice. 
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